EC-PSI: Associating Enzyme Commission Numbers with Pfam Domains
نویسندگان
چکیده
With the growing number of protein structures in the protein data bank (PDB), there is a need to annotate these structures at the domain level in order to relate protein structure to protein function. Thanks to the SIFTS database, many PDB chains are now cross-referenced with Pfam domains and enzyme commission (EC) numbers. However, these annotations do not include any explicit relationship between individual Pfam domains and EC numbers. This article presents a novel statistical training-based method called EC-PSI that can automatically infer high confidence associations between EC numbers and Pfam domains directly from EC-chain associations from SIFTS and from EC-sequence associations from the SwissProt, and TrEMBL databases. By collecting and integrating these existing EC-chain/sequence annotations, our approach is able to infer a total of 8,329 direct EC-Pfam associations with an overall F-measure of 0.819 with respect to the manually curated InterPro database, which we treat here as a “gold standard” reference dataset. Thus, compared to the 1,493 EC-Pfam associations in InterPro, our approach provides a way to find over six times as many high quality EC-Pfam associations completely automatically.
منابع مشابه
SCOPEC: a database of protein catalytic domains
MOTIVATION Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information fro...
متن کاملdcGO: database of domain-centric ontologies on functions, phenotypes, diseases and more
We present 'dcGO' (http://supfam.org/SUPERFAMILY/dcGO), a comprehensive ontology database for protein domains. Domains are often the functional units of proteins, thus instead of associating ontological terms only with full-length proteins, it sometimes makes more sense to associate terms with individual domains. Domain-centric GO, 'dcGO', provides associations between ontological terms and pro...
متن کاملECOH: An Enzyme Commission number predictor using mutual information and a support vector machine
MOTIVATION The enzyme nomenclature system, commonly known as the enzyme commission (EC) number, plays a key role in classifying and predicting enzymatic reactions. However, numerous reactions have been described in various pathways that do not have an official EC number, and the reactions are not expected to have an EC number assigned because of a lack of articles published on enzyme assays. To...
متن کاملE-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs
MOTIVATION The IUBMB's Enzyme Nomenclature system, commonly known as the Enzyme Commission (EC) numbers, plays key roles in classifying enzymatic reactions and in linking the enzyme genes or proteins to reactions in metabolic pathways. There are numerous reactions known to be present in various pathways but without any official EC numbers, most of which have no hope to be given ones because of ...
متن کاملCascade PSI-BLAST web server: a remote homology search tool for relating protein domains
Owing to high evolutionary divergence, it is not always possible to identify distantly related protein domains by sequence search techniques. Intermediate sequences possess sequence features of more than one protein and facilitate detection of remotely related proteins. We have demonstrated recently the employment of Cascade PSI-BLAST where we perform PSI-BLAST for many 'generations', initiatin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015